机器学习中的隐私和安全挑战(ML)已成为ML普遍的开发以及最近对大型攻击表面的展示,已成为一个关键的话题。作为一种成熟的以系统为导向的方法,在学术界和行业中越来越多地使用机密计算来改善各种ML场景的隐私和安全性。在本文中,我们将基于机密计算辅助的ML安全性和隐私技术的发现系统化,以提供i)保密保证和ii)完整性保证。我们进一步确定了关键挑战,并提供有关ML用例现有可信赖的执行环境(TEE)系统中限制的专门分析。我们讨论了潜在的工作,包括基础隐私定义,分区的ML执行,针对ML的专用发球台设计,TEE Awawe Aware ML和ML Full Pipeline保证。这些潜在的解决方案可以帮助实现强大的TEE ML,以保证无需引入计算和系统成本。
translated by 谷歌翻译
Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety specification as an optimization problem. We propose a multi-fidelity falsification framework using Bayesian optimization, which is able to determine at which level of fidelity we should conduct a safety evaluation in addition to finding possible instances from the environment that cause the system to fail. This method allows us to automatically switch between inexpensive, inaccurate information from a low-fidelity simulator and expensive, accurate information from a high-fidelity simulator in a cost-effective way. Our experiments on various environments in simulation demonstrate that multi-fidelity Bayesian optimization has falsification performance comparable to single-fidelity Bayesian optimization but with much lower cost.
translated by 谷歌翻译
With the growth of editing and sharing images through the internet, the importance of protecting the images' authorship has increased. Robust watermarking is a known approach to maintaining copyright protection. Robustness and imperceptibility are two factors that are tried to be maximized through watermarking. Usually, there is a trade-off between these two parameters. Increasing the robustness would lessen the imperceptibility of the watermarking. This paper proposes an adaptive method that determines the strength of the watermark embedding in different parts of the cover image regarding its texture and brightness. Adaptive embedding increases the robustness while preserving the quality of the watermarked image. Experimental results also show that the proposed method can effectively reconstruct the embedded payload in different kinds of common watermarking attacks. Our proposed method has shown good performance compared to a recent technique.
translated by 谷歌翻译
In this paper, deep-learning-based approaches namely fine-tuning of pretrained convolutional neural networks (VGG16 and VGG19), and end-to-end training of a developed CNN model, have been used in order to classify X-Ray images into four different classes that include COVID-19, normal, opacity and pneumonia cases. A dataset containing more than 20,000 X-ray scans was retrieved from Kaggle and used in this experiment. A two-stage classification approach was implemented to be compared to the one-shot classification approach. Our hypothesis was that a two-stage model will be able to achieve better performance than a one-shot model. Our results show otherwise as VGG16 achieved 95% accuracy using one-shot approach over 5-fold of training. Future work will focus on a more robust implementation of the two-stage classification model Covid-TSC. The main improvement will be allowing data to flow from the output of stage-1 to the input of stage-2, where stage-1 and stage-2 models are VGG16 models fine-tuned on the Covid-19 dataset.
translated by 谷歌翻译
Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date fell short in aberration detection. Using a training set of ~10k patient specimens and ~50k karyograms from over 5 years from the Fred Hutchinson Cancer Center, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. The top-accuracy models utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias. TopViT outperformed CNN (Inception) models with >99.3% accuracy for chromosome identification, and exhibited accuracies >99% for aberration detection in most aberrations. Notably, we were able to show high-quality performance even in "few shot" learning scenarios. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). When applied to "zero shot" scenarios, the model captured aberrations without training, with perfect precision at >50% recall. Together these results show that modern deep learning models can approach expert-level performance for chromosome aberration detection. To our knowledge, this is the first study demonstrating the downstream effectiveness of TopViTs. These results open up exciting opportunities for not only expediting patient results but providing a scalable technology for early screening of low-abundance chromosomal lesions.
translated by 谷歌翻译
Time series anomaly detection has applications in a wide range of research fields and applications, including manufacturing and healthcare. The presence of anomalies can indicate novel or unexpected events, such as production faults, system defects, or heart fluttering, and is therefore of particular interest. The large size and complex patterns of time series have led researchers to develop specialised deep learning models for detecting anomalous patterns. This survey focuses on providing structured and comprehensive state-of-the-art time series anomaly detection models through the use of deep learning. It providing a taxonomy based on the factors that divide anomaly detection models into different categories. Aside from describing the basic anomaly detection technique for each category, the advantages and limitations are also discussed. Furthermore, this study includes examples of deep anomaly detection in time series across various application domains in recent years. It finally summarises open issues in research and challenges faced while adopting deep anomaly detection models.
translated by 谷歌翻译
Bayesian optimization (BO) is increasingly employed in critical applications such as materials design and drug discovery. An increasingly popular strategy in BO is to forgo the sole reliance on high-fidelity data and instead use an ensemble of information sources which provide inexpensive low-fidelity data. The overall premise of this strategy is to reduce the overall sampling costs by querying inexpensive low-fidelity sources whose data are correlated with high-fidelity samples. Here, we propose a multi-fidelity cost-aware BO framework that dramatically outperforms the state-of-the-art technologies in terms of efficiency, consistency, and robustness. We demonstrate the advantages of our framework on analytic and engineering problems and argue that these benefits stem from our two main contributions: (1) we develop a novel acquisition function for multi-fidelity cost-aware BO that safeguards the convergence against the biases of low-fidelity data, and (2) we tailor a newly developed emulator for multi-fidelity BO which enables us to not only simultaneously learn from an ensemble of multi-fidelity datasets, but also identify the severely biased low-fidelity sources that should be excluded from BO.
translated by 谷歌翻译
The BIOSCAN project, led by the International Barcode of Life Consortium, seeks to study changes in biodiversity on a global scale. One component of the project is focused on studying the species interaction and dynamics of all insects. In addition to genetically barcoding insects, over 1.5 million images per year will be collected, each needing taxonomic classification. With the immense volume of incoming images, relying solely on expert taxonomists to label the images would be impossible; however, artificial intelligence and computer vision technology may offer a viable high-throughput solution. Additional tasks including manually weighing individual insects to determine biomass, remain tedious and costly. Here again, computer vision may offer an efficient and compelling alternative. While the use of computer vision methods is appealing for addressing these problems, significant challenges resulting from biological factors present themselves. These challenges are formulated in the context of machine learning in this paper.
translated by 谷歌翻译
In the last few years, various types of machine learning algorithms, such as Support Vector Machine (SVM), Support Vector Regression (SVR), and Non-negative Matrix Factorization (NMF) have been introduced. The kernel approach is an effective method for increasing the classification accuracy of machine learning algorithms. This paper introduces a family of one-parameter kernel functions for improving the accuracy of SVM classification. The proposed kernel function consists of a trigonometric term and differs from all existing kernel functions. We show this function is a positive definite kernel function. Finally, we evaluate the SVM method based on the new trigonometric kernel, the Gaussian kernel, the polynomial kernel, and a convex combination of the new kernel function and the Gaussian kernel function on various types of datasets. Empirical results show that the SVM based on the new trigonometric kernel function and the mixed kernel function achieve the best classification accuracy. Moreover, some numerical results of performing the SVR based on the new trigonometric kernel function and the mixed kernel function are presented.
translated by 谷歌翻译
由于临床实践所需的放射学报告和研究是在自由文本叙述中编写和存储的,因此很难提取相对信息进行进一步分析。在这种情况下,自然语言处理(NLP)技术可以促进自动信息提取和自由文本格式转换为结构化数据。近年来,基于深度学习(DL)的模型已适用于NLP实验,并具有令人鼓舞的结果。尽管基于人工神经网络(ANN)和卷积神经网络(CNN)的DL模型具有显着潜力,但这些模型仍面临临床实践中实施的一些局限性。变形金刚是另一种新的DL体系结构,已越来越多地用于改善流程。因此,在这项研究中,我们提出了一种基于变压器的细粒命名实体识别(NER)架构,以进行临床信息提取。我们以自由文本格式收集了88次腹部超声检查报告,并根据我们开发的信息架构进行了注释。文本到文本传输变压器模型(T5)和covive是T5模型的预训练域特异性适应性,用于微调来提取实体和关系,并将输入转换为结构化的格式。我们在这项研究中基于变压器的模型优于先前应用的方法,例如基于Rouge-1,Rouge-2,Rouge-L和BLEU分别为0.816、0.668、0.528和0.743的ANN和CNN模型,同时提供了一个分数可解释的结构化报告。
translated by 谷歌翻译